Technical Report 



IDSIA-10-06 



A Formal Measure of Machine Intelligence 



Shane Legg and Marcus Hutter 



o 
o 

(N 



< 

{A 
o 



> 

(N 
O 

in 
o 
\o 
o 

o 



X 



IDSIA, Galleria 2, CH-6928 Manno-Lugano, Switzerland 
{shane,marcus}@idsia.ch |http://www.idsia.ch/| 

14 April 2006 



Abstract 

A fundamental problem in artificial intelligence is 
that nobody really knows what intelligence is. The 
problem is especially acute when we need to con- 
sider artificial systems which are significantly dif- 
ferent to humans. In this paper we approach this 
problem in the following way: We take a number 
of well known informal definitions of human intelli- 
gence that have been given by experts, and extract 
their essential features. These are then mathemat- 
ically formalised to produce a general measure of 
intelligence for arbitrary machines. We believe that 
this measure formally captures the concept of ma- 
chine intelligence in the broadest reasonable sense. 

1 Introduction 

Most of us think that we recognise intelligence when 
we see it, but we are not really sure how to pre- 
cisely define or measure it. We informally judge 
the intelligence of others by relying on our past ex- 
periences in dealing with people. Naturally, this 
naive approach is highly subjective and imprecise. 
A more principled approach would be to use one 
of the many standard intelligence tests that are 
available. Contrary to popular wisdom, these tests, 
when correctly applied by a professional, deliver 
statistically consistent results and have considerable 
power to predict the future performance of individ- 
uals in many mentally demanding tasks. However, 
while these tests work well for humans, if we wish 
to measure the intelligence of other things, perhaps 
of a monkey or a new machine learning algorithm, 
they are clearly inappropriate. 

One response to this problem might be to de- 
velop specific kinds of tests for specific kinds of en- 
tities; just as intelligence tests for children differ 
to intelligence tests for adults. While this works 
well when testing humans of different ages, it comes 
undone when we need to measure the intelligence 



of entities which are profoundly different to each 
other in terms of their cognitive capacities, speed, 
senses, environments in which they operate, and so 
on. To measure the intelligence of such diverse sys- 
tems in a meaningful way we must step back from 
the specifics of particular systems and establish the 
underlying fundamentals of what it is that we are 
really trying to measure. That is, we need to estab- 
lish a notion of intelligence that goes beyond the 
specifics of particular kinds of systems. 

The difficulty of doing this is readily apparent. 
Consider, for example, the memory and numerical 
computation tasks that appear in some intelligence 
tests and which were once regarded as defining hall- 
marks of human intelligence. We now know that 
these tasks are absolutely trivial for a machine and 
thus do not test the machine's intelligence. Indeed 
even the mentally demanding task of playing chess 
has been largely reduced to brute force search. As 
technology advances, our concept of what intelli- 
gence is continues to evolve with it. 

How then are we to develop a concept of intelli- 
gence that is applicable to all kinds of systems? Any 
proposed definition must encompass the essence of 
human intelligence, as well as other possibilities, in 
a consistent way. It should not be limited to any 
particular set of senses, environments or goals, nor 
should it be limited to any specific kind of hard- 
ware, such as silicon or biological neurons. It should 
be based on principles which are sufficiently funda- 
mental so as to be unlikely to alter over time. Fur- 
thermore, the intelligence measure should ideally be 
formally expressed, objective, and practically real- 
isable. 

This paper approaches this problem in the fol- 
lowing way. In Section\^we consider a range of def- 
initions of human intelligence that have been put 
forward by well known psychologists. From these 
we extract the most common and essential features 
and use them to create an informal definition of 
intelligence. Section then introduces the frame- 
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work which we use to construct our formal measure 
of intelligence. This framework is formally defined 
in Section In Section [3| we use our developed 
formalism to produce a formal definition of intelli- 
gence. Section ^ closes with a short summary. 

A preliminary sketch of the ideas in this paper ap- 
peared in the poster |LH05| . It can be shown that 
the intelligence measure presented here is in fact a 
variant of the Intelligence Order Relation that ap- 
pears in the theory of AIXI, the provably optimal 
universal agent Hut04 . A long journal version of 
this paper is being written in which we give the pro- 
posed measure of machine intelligence and its rela- 
tion to other such tests a much more comprehensive 
treatment. 

Naturally, we expect such a bold initiative to be 
met with resistance. However, we hope that the 
reader will appreciate the value of our approach: 
With a formally precise definition put forward we 
aim to better our understanding of what is a noto- 
riously subjective and slippery concept. 



2 The concept of intelligence 

Although definitions of human intelligence given by 
experts in the field vary, most of their views clus- 
ter around a few common perspectives. Perhaps 
the most common perspective, roughly stated, is to 
think of intelligence as being the ability to success- 
fully operate in uncertain environments by learn- 
ing and adapting based on experience. The follow- 
ing often quoted definitions, which can be found 
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this notion of intelligence but with different empha- 
sis in each case: 

• "The capacity to learn or to profit by experi- 
ence." - W. F. Dearborn 

• "Ability to adapt oneself adequately to rela- 
tively new situations in life." - R. Pinter 

• "A person possesses intelligence insofar as he 
has learned, or can learn, to adjust himself to 
his environment." - S. S. Colvin 

• "We shall use the term 'intelligence' to mean 
the ability of an organism to solve new prob- 
lems. ..." - W. V. Bingham 

• "A global concept that involves an individ- 
ual's ability to act purposefully, think ratio- 
nally, and deal effectively with the environ- 
ment." - D. Wechsler 



think abstractly, comprehend complex ideas, 
learn quickly and learn from experience." 
- L. S. Gottfredson and 52 expert signatories 

These definitions have certain common features; 
in some cases they are explicitly stated, while in 
others they are more implicit. Perhaps the most 
elementary feature is that intelligence is seen as a 
property of an entity which is interacting with an 
external environment, problem or situation. Indeed 
this much is common to practically all proposed def- 
initions of intelligence. As we will be referring back 
to these concepts regularly, we will refer to the en- 
tity whose intelligence is in question as the agent, 
and the external environment, problem or situation 
that it faces as the environment. An environment 
could be a large complex world in which the agent 
exists, similar to the usual meaning, or something 
as narrow as a game of tic-tac-toe. 

The second common feature of these definitions 
is that an agent's intelligence is related to its abil- 
ity to succeed in an environment. This implies that 
the agent has some kind of an objective. Perhaps 
we could consider an agent intelligent, in an ab- 
stract sense, without having any objective. How- 
ever without any objective what so ever, the agent's 
intelligence would have no observable consequences. 
Intelligence then, at least the concrete kind that in- 
terests us, comes into effect when an agent has an 
objective to apply its intelligence to. Here we will 
refer to this as its goal. 

The emphasis on learning, adaption and experi- 
ence in these definitions implies that the environ- 
ment is not fully known to the agent and may con- 
tain surprises and new situations which could not 
have been anticipated in advance. Thus intelligence 
is not the ability to deal with one fixed and known 
environment, but rather the ability to deal with 
some range of possibilities which cannot be wholly 
anticipated. This means that an intelligent agent 
may not be the best possible in any specific envi- 
ronment, particularly before it has had sufficient 
time to learn. What is important is that the agent 
is able to learn and adapt so as to perform well over 
a wide range of specific environments. 

Although there is a great deal more to this topic 
than we have presented here, the above brief anal- 
ysis gives us the necessary building blocks for our 
informal working definition of intelligence: 

Intelligence measures an agent's ability to 
achieve goals in a wide range of environ- 
ments. 



"Intelligence is a very general mental ca- 
pability that, among other things, involves 
the ability to reason, plan, solve problems, 



We realise that some researchers who study in- 
telligence will take issue with this definition. Given 
the diversity of views on the nature of intelligence, 
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a debate which is still being fought, this is un- 
avoidable. Nevertheless, we are confident that our 
proposed informal working definition is fairly main- 
stream. We also believe that our definition captures 
what we are interested in achieving in machines: A 
very general and flexible capacity to succeed when 
faced with a wide range of problems and situations. 
Even those who subscribe to different perspectives 
on the nature and correct definition of intelligence 
will surely agree that this is a central objective for 
anyone wishing to extend the power and usefulness 
of machines. It is also a definition that can be suc- 
cessfully formalised. 

3 The agent-environment 
framework 

In the previous section we identified three essential 
components for our model of intelligence: An agent, 
an environment, and a goal. Clearly, the agent and 
the environment must be able to interact with each 
other; specifically, the agent needs to be able to 
send signals to the environment and also receive 
signals being sent from the environment. Similarly 
the environment must be able to receive and send 
signals to the agent. In our terminology we will 
adopt the agent's perspective on these communica- 
tions and refer to the signals from the agent to the 
environment as actions, and the signals from the 
environment as perceptions. 

What is missing from this setup is the goal. As 
discussed in the previous section, our definition of 
an agent's intelligence requires there to be some 
kind of goal for the agent to try to achieve. This 
implies that the agent somehow knows what the 
goal is. One possibility would be for the goal to be 
known in advance and for this knowledge to be built 
into the agent. The problem with this however is 
that it limits each agent to just one goal. We need 
to allow agents which are more flexible than this. 

If the goal is not known in advance, the other 
alternative is to somehow inform the agent of what 
the goal is. For humans this is easily done using 
language. In general however, the possession of a 
sufficiently high level of language is too strong an 
assumption to make about the agent. Indeed, even 
for something as intelligent as a dog or a cat, direct 
explanation will obviously not work. 

Fortunately there is another possibility. We can 
define an additional communication channel with 
the simplest possible semantics: A signal that indi- 
cates how good the agent's current situation is. We 
will call this signal the reward. The agent's goal is 
then simply to maximise the amount of reward it 
receives, so in a sense its goal is fixed. This is not 
limiting though as we have not said anything about 
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Figure 1: The agent and the environment interact 
by sending action, observation and reward signals 
to each other. 



what causes different levels of reward to occur. In 
a complex setting the agent might be rewarded for 
winning a game or solving a difficult puzzle. From 
a broad perspective then, the goal is flexible. If the 
agent is to succeed in its environment, that is, re- 
ceive a lot of reward, it must learn about the struc- 
ture of the environment and in particular what it 
needs to do in order to get reward. 

Not surprisingly, this is exactly the way in which 
we condition an animal to achieve a goal: by se- 
lectively rewarding certain behaviours. In a narrow 
sense the animal's goal is fixed, perhaps to get more 
treats to eat, but in a broader sense this may require 
doing a trick or solving a puzzle. 

In our framework we will include the reward sig- 
nal as a part of the perception generated by the 
environment. The perceptions also contain a non- 
reward part, which we will refer to as observations. 
This now gives us the complete system of interact- 
ing agent and environment in Figure^ The goal, in 
the broad flexible sense, is implicitly defined by the 
environment as this is what defines when rewards 
are generated. Thus in this framework, to test an 
agent in any given way, it is sufficient to fully define 
the environment. 

In artificial intelligence, this framework is used 
in the area of reinforcement learning SB98 . By 
appropriately renaming things, it also describes the 
controller-plant framework used in control theory. 
It is a widely used and very general structure that 
can describe seemingly any kind of learning or con- 
trol problem. The interesting point for us is that 
this type of framework follows naturally from our 
informal definition of intelligence. The only diffi- 
culty was how to deal with the notion of success, or 
profit. This requires the existence of some kind of 
objective or goal, and the most flexible and elegant 
way to bring this into our framework is by using a 
simple reward signal. 
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4 A formal framework for in- 
telligence 

Having made the basic framework explicit, we can 
now formalise things. See |Hut04j for a more com- 
plete technical description along with many more 
example agents and environments. 

The agent sends information to the environment 
by sending symbols from some finite set, for exam- 
ple, A := {lef t, right, forwards, backwards}. We 
will call this set the action space and denote it by 
A. Similarly, the environment sends signals to the 
agent with symbols from a finite set called the per- 
ception space, which we will denote V . The reward 
space, denoted by 1Z, will always be a finite subset 
of the rational unit interval [0, 1] HQ. Every percep- 
tion consists of two separate parts; an observation 
and a reward. For example, we might have V := 
{{cold, 0.0), {warm, 1.0), {hot, 0.3), {roasting, 0.0)}. 

To denote symbols being sent we will use the 
lower case variable names a, o and r for actions, 
observations and rewards respectively. We will also 
index these in the order in which they occur, thus 
ai is the agent's first action, a 2 is the second ac- 
tion and so on. The agent and the environment 
will take turns at sending symbols, starting with 
the environment. This produces a history of obser- 
vations, rewards and actions which we will denote 
by, oi 7-10,102^0203^0304 . . .. Our restriction to fi- 
nite action and perception spaces is deliberate as 
an agent should not be able to receive or generate 
information without bound in a single cycle in time. 
Of course, the action and perception spaces can still 
be extremely large, if required. 

Formally, the agent is a function, denoted by tt, 
which takes the current history as input and chooses 
the next action as output. A convenient way of rep- 
resenting the agent is as a probability measure over 
actions conditioned on the current history. Thus 
7r ( a 3|°i r 'iai02' , 2) is the probability of action 03 in 
the third cycle, given that the current history is 
Oiriai02r 2 . A deterministic agent is simply one 
that always assigns a probability of 1 to some ac- 
tion for any given history. How the agent produces 
the distribution over actions for any given history 
is left completely open. Of course in artificial intel- 
ligence the agent will be a machine and so tt will be 
a computable function. 

The environment, denoted fj,, is defined 
in a similar way. Specifically, for any 
k G N the probability of o^r^, given the 
current history o\T\a\ . . . Ok-irk-icik-ii is 
^{okrk\oinai . . . o fe „ir fe _ia fc _i). For the mo- 
ment we will not place any further restrictions on 
the environment. 

Our next task is to formalise the idea of "profit" 



or "success" for an agent. Informally, we know that 
the agent must try to maximise the amount of re- 
ward it receives, however this could mean several 
different things. 

Example. Define the reward space 1Z :— {0, 1}, an 
action space A := {0, 1} and an observation space 
that just contains the null string, O := {e}. Now 
define a simple environment, 

H{r k \oi . . . a k -i) := 1 - \r k - a k -i\. 

As the agent always get a reward equal to its action, 
the optimal agent for this environment is clearly 
"Kopt{a>k\oi ■ ■ ■ Tfc) :— ak- Consider now two other 
agents for this environment, 7Ti(afc|oi . . . r^) = ^ 
and 



7r 2 (a fc |oi . . .r k ) := 




for o fc = A k < 100, 

for o fc = 1 A 100 < k < 5000, 

for 5000 < k, 

otherwise. 



For 1 < k < 100 the expected reward per cycle 
for 7ri is higher than it is for 1x2- Thus in the short 
term 7Ti is the most successful. On the other hand, 
for 100 < k < 5000, TT2 has switched to the opti- 
mal strategy of always guessing that 1 head will be 
thrown, while tx\ has not. Thus in the medium term 
7T2 is more successful. Finally, for k > 5000, both 
agents use random actions and thus in the limit they 
are equally successful. 

Which is the better agent? If you want to max- 
imise short term rewards, it is agent tt\. If you 
want to maximise medium term rewards, then it is 
agent 1x2- And if you only care about the long run, 
both agents are equally successful. Which agent you 
prefer depends on your temporal preferences, some- 
thing which is currently outside of our formulation. 

The standard way of formalising this in reinforce- 
ment learning is to assume that the value of rewards 
decay geometrically into the future at a rate given 
by a discount parameter 7 G (0, 1), that is, 

V?(7):= F E (f> V <) C 1 ) 

where Vi is the reward in cycle i of a given history, 
the normalising constant is T := 7*i an< ^ the 

expected value is taken over all histories of tt and 
/j, interacting. By increasing 7 towards 1 we weight 
long term rewards more heavily, conversely by re- 
ducing it we balance the weighting towards short 
term rewards. 

Of course this has not actually answered the 
question of how to weight near term rewards ver- 
sus longer term rewards. Rather it has simply ex- 
pressed this weighting as a parameter. While that is 
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adequate for some purposes, what we would like is a 
single test of intelligence for machines, not a range 
of tests that vary according to some free parameter. 
That is, we would like the temporal preferences to 
be included in the model, not external to it. 

One possibility might be to use harmonic dis- 
counting, 7 t := -k. This has some nice properties, 
in particular the agent needs to look forward into 
the future in a way that is proportional to its cur- 
rent age jHut04| . However an even more elegant 
solution is possible. 

If we look at the value function in Equation Q 
we see that geometric discounting plays two roles. 
Firstly, it normalises the total reward received 
which makes the sum finite, in this case with a 
maximum value of 1. Secondly, it weights the re- 
ward at different points in the future which in effect 
defines a temporal preference. We can solve both 
of these problems, without needing an external pa- 
rameter, by simply requiring that the total reward 
returned by the environment cannot exceed 1 . For a 
reward summable environment fj, we can now define 
the value function to be simply, 

l^:= E^f>^ <1. (2) 

One way of viewing this is that the rewards re- 
turned by the environment now have the temporal 
preference factored in and thus we do not need to 
add this. The cost is that this is an additional con- 
dition that we place on the environments. Previ- 
ously we required that each reward signal was in a 
finite subset of [0, l]flQ, now we have the additional 
constraint that the sum is bounded. 

It may seem that there is a philosophical problem 
here. If an environment \x is an artificial game, like 
chess, then it seems fairly natural for /x to meet 
any requirements in its definition, such as having a 
bounded reward sum. However if we think of the 
environment \x as being "the universe" in which the 
agent lives, then it seems unreasonable to expect 
that it should be required to respect such a bound. 
The flaw in this argument is that a "universe" does 
not have any notion of reward for particular agents. 

Strictly speaking, reward is an interpretation of 
the state of the environment. In humans this is built 
in, for example, the pain that is experienced when 
you touch something hot. In which case, maybe 
it should really be a part of the agent rather than 
the environment? If we gave the agent complete 
control over rewards then our framework would be- 
come meaningless: The perfect agent could simply 
give itself constant maximum reward. Indeed hu- 
mans cannot easily do this either, at least not with- 
out taking drugs designed to interfere with their 
pleasure-pain mechanism. 



Thus the most accurate framework would consist 
of an agent, an environment and a separate goal sys- 
tem that interpreted the state of the environment 
and rewarded the agent appropriately. In such a set 
up the bounded rewards restriction would be a part 
of the goal system and thus the above philosophical 
problem does not occur. However for our current 
purposes it is seem sufficient just to fold this goal 
mechanism into the environment and add an eas- 
ily implemented constraint to how the environment 
may generate rewards. 

5 A formal measure of intelli- 
gence 

We have now formally defined the space of agents, 
how they interact with each other, and how we mea- 
sure the performance of an agent in any specific 
environment. Before we can put all this together 
into a single performance measure, we firstly need 
to define what me mean by "a wide range of envi- 
ronments." 

As our goal is to produce a measure of intelligence 
that is as broad and encompassing as possible, the 
space of environments used in our definition should 
be as large as possible. Given that our environment 
is a probability measure with a certain structure, an 
obvious possibility would be to consider the space 
of all probability measures of this form. Unfortu- 
nately, this extremely broad class of environments 
causes problems. As the space of all probability 
measures is uncountably infinite, we cannot list the 
members of this set, nor can we always describe en- 
vironments in a finite way. 

The solution is to require the environmental mea- 
sures to be computable. Not only is this necessary 
if we are to have an effective measure of intelligence, 
it is also not all that restrictive. There are an in- 
finite number of environments in this set, with no 
upper bound on their complexity. Furthermore, it is 
only the measure which describes the environment 
that must be computable. For example, although a 
typical sequence of l's and 0's generated by flipping 
a coin is not computable, the probability measure 
which describes this process is computable. Thus, 
even environments which behave randomly are in- 
cluded in our space of environments. This appears 
to be the largest reasonable space of environments. 
Indeed, no physical system has ever been shown to 
lie outside of this set. If such a physical system was 
found, it would overturn the Church- Turing thesis 
and alter our view of the universe. 

How can we combine the agent's performance 
over all these environments? As there are an infinite 
number of environments, we cannot simply take a 
uniform distribution over them. Mathematically, 
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we must weight some environments more highly 
than others. If we consider the agent's perspective 
on the problem, this question is the same as asking: 
Given several different hypotheses which are consis- 
tent with the data, which hypothesis should be con- 
sidered the most likely? This is a frequently occur- 
ring problem in inductive inference where we must 
employ a philosophical principle to decide which 
hypothesis is the most likely. The most success- 
ful approach is to invoke the principle of Occam's 
razor: Given multiple hypotheses which are consis- 
tent with the data, the simplest should be preferred. 
This is generally considered the rational and intel- 
ligent thing to do. 

Consider for example the following type of ques- 
tion which commonly appears in intelligence tests. 
There is a sequence such as 2, 4, 6, 8, and the 
test subject needs to predict the next number. Of 
course the pattern is immediately clear: The num- 
bers are increasing by 2 each time. An intelligent 
person would easily identify this pattern and predict 
the next digit to be 10. However, the polynomial 
2k 4 - 20k 3 + 70k 2 - 98A: + 48 is also consistent with 
the data, in which case the next number in the se- 
quence would be 58. Why then do we consider the 
first answer to be more likely? It is because we use, 
perhaps unconsciously, the principle of Occam's ra- 
zor. Furthermore, the fact that the test defines this 
as the correct answer shows that it too embodies 
the concept of Occam's razor. Thus, although we 
don't usually mention Occam's razor when defining 
intelligence, the ability to effectively use Occam's 
razor is clearly a part of intelligent behaviour. 

Our formal measure of intelligence needs to re- 
flect this. Specifically, we need to test the agents in 
such a way that they are, at least on average, re- 
warded for correctly applying Occam's razor. For- 
mally, this means that our a priori distribution over 
environments should be weighted towards simpler 
environments. The problem now becomes: How 
should we measure the complexity of environments? 

As each environment is computable, it can be rep- 
resented by a program, or more formally, a binary 
string p G B* on some prefix universal Turing ma- 
chine U. Thus we can use Kolmogorov complex- 
ity to measure the complexity of an environment 
It G E, 

K(n) := min {[p| : U{p) computes pi). 

This measure is independent of the choice of U up to 
an additive constant that is independent of thus, 
we simply pick one universal Turing machine U and 
fix it. The correct way to turn this into a prior 
distribution is by taking 2^ K ^\ This is known as 
the algorithmic probability distribution and it has a 
number of important properties, particularly in the 



context of universally optimal learning agents. See 
|LV97| or |Hut04| for an overview of Kolmogorov 
complex and universal prior distributions. 

Putting this all together, we can now define our 
formal measure of intelligence for arbitrary systems. 
Let E be the space of all programs that compute 
environmental measures of summable reward with 
respect to a prefix universal Turing machine U, let 
K be the Kolmogorov complexity function. The 
intelligence of an agent 7r is defined as, 

T(tt) := J2 = VI, 

tJ.eE 

where £ := YlueE ^~ K A* due to the linearity of 
V. £ is the Solomonoff-Levin universal a priori dis- 
tribution generalised to reactive environments. 

6 Properties of the intelli- 
gence measure 

To better understand the performance of this mea- 
sure consider some example agents. 

A random agent. The agent with the lowest intel- 
ligence, at least among those that are not actively 
trying to perform badly, would be one that makes 
uniformly random actions. We will call this 7r rand . 
In general such an agent will not be very successful 
as it will fail to exploit any regularities in the envi- 
ronment, no matter how simple they are. It follows 
then that the values of V* will typically be low 
compared to other agents, and thus T(7r rand ) will 
be low. 

A very specialised agent. From the equation for 
T, we see that an agent could have very low intelli- 
gence but still perform extremely well at a few very 
specific and complex tasks. Consider, for exam- 
ple, IBM's Deep Blue chess supercomputer, which 
we will represent by 7r dblue . When ^ chess describes 
the game of chess, VJh.=» is very high. However 

2-K( fJ , ch '"') - 1S sma \\^ anc [ f or p ^chess tne va J ue 

function will be low relative to other agents as 7r dblue 
only plays chess. Therefore, the value of T(7r dblue ) 
will be very low. Intuitively, this is because Deep 
Blue is too inflexible and narrow to have general 
intelligence. 

A general but simple agent. Imagine an agent 
that does very basic learning by building up a ta- 
ble of observation and action pairs and keeping 
statistics on the rewards that follow. Each time 
an observation that has been seen before occurs, 
the agent takes the action with highest estimated 
expected reward in the next cycle with 90% prob- 
ability, or a random action with 10% probability. 
We will call this agent 7r baslc . It is immediately 
clear that many environments, both complex and 
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very simple, will have at least some structure that 
such an agent would take advantage of. Thus for 
almost all /i we will have > and so 

T(7r basic ) > T(7r rand ). Intuitively, this is what we 
would expect as 7r baslc , while very simplistic, is 
surely more intelligent than 7r rand . 

A simple agent with more history. A natural ex- 
tension of 7r baslc is to use a longer history of actions, 
observations and rewards in its internal table. Let 
7r 2back kg -j-jjg a g en ^ that builds a table of statis- 
tics for the expected reward conditioned on the last 
two actions, rewards and observations. It is imme- 
diately clear 7r 2back is a generalisation of 7r baslc by 
definition and thus will adapt to any regularity that 
7r baslc can adapt to. It follows then that in general 
Vf~* > Vf*° lc and so T(7r 2back ) > T(7r basic ), as 
we would intuitively expect. 

In a similar way agents of increasing complex- 
ity and adaptability can be defined which will have 
still greater intelligence. However with more com- 
plex agents it is usually difficult to theoretically es- 
tablish whether one agent has more or less intel- 
ligence than another. Nevertheless, it is hopefully 
clear from these simple examples that the more flex- 
ible and powerful an agent is, the higher its machine 
intelligence. 

A human. For extremely simple environments, a 
human should be able to identify their simple struc- 
ture and exploit this to maximise reward. For more 
complex environments however it is hard to know 
how well a human would perform without experi- 
mental results. 

Super-human intelligence. It can be easily proven 
that the theoretical AIXI agent Hut04 is the max- 
imally intelligent agent with respect to T. AIXI 
has been proven to have many universal optimal- 
ity properties, including being Pareto optimal and 
self-optimising in any environment in which this is 
possible for a general agent. Thus it is clear that 
agents with very high T must be extremely power- 
ful. 

In addition to sensibly ordering many simple 
learning agents, this formal definition has many sig- 
nificant and desirable properties: 

Valid. The most important property of a measure 
of intelligence is that it does indeed measure "intel- 
ligence" . As T formalises a mainstream informal 
definition, we believe that it is valid measure. 

Meaningful. An agent with a high T value must 
perform well over a very wide range of environ- 
ments, in particular it must perform well in almost 
all simple environments. If such a agent existed, it 
would clearly be very powerful and practically use- 
ful. It also sensibly orders the intelligence of simple 
learning agents. 

Repeatable. We can test an agent using the T re- 
peatedly without problem. This is because it is de- 



fined across all well defined environments, not just 
a specific test subset which an agent might adapt 
to. 

Absolute. T gives us a single real absolute value, 
unlike the pass- fail Turing test Tur50| . This is im- 
portant if we want to make distinctions between 
similar learning algorithms that are not close to hu- 
man level intelligence. 

Wide range. As we have seen, T can measure per- 
formance from extremely simple agents right up to 
the super powerful AIXI agent. Other tests cannot 
hand such an enormus range. 

General. The test is clearly non-specific to the 
implementation of the agent as the inner workings 
of the agent is left completely undefined. It is also 
very general in terms of what senses or actuators 
the agent might have as all information exchanged 
between the agent and the environment takes place 
over basic Shannon like communication channels. 

Dynamic. One aspect of our test of intelligence 
is that it is, in the terminology of intelligence test- 
ing, a highly dynamic test |SG02j . Normally in- 
telligence tests for humans only test the ability to 
solve one-off problems. There are no dynamic as- 
pects to the test where the test subject has to in- 
teract with something and learn and adapt their 
behaviour accordingly. This makes it very hard to 
test things like the individual's ability to quickly 
pick up new skills and adapt to new situations. One 
way to overcome these problems is to use more so- 
phisticated dynamic tests. In these tests there is 
an active tester who constantly interacts with the 
test subject, much like what happens in our formal 
intelligence measure. 

Unbiased. The test is not weighted towards abil- 
ity in certain specific kinds of areas or problems, 
rather it is simply weighted towards simpler envi- 
ronments no matter what they are. 

Fundamental. The test is based on the theory 
of information, Turing computation and complexity 
theory. These are all fundamental ideas which are 
likely to remain very stable over time irrespective 
of changes in technology. 

Formal. Unlike many tests of intelligence, T is 
completely formally, mathematically, specified. 

Objective. Unlike the Turing test which requires 
a panel of judges to decide if an agent is intelligent 
or not, T is fee of such subjectivity. 

Our definition of intelligence also has some weak- 
nesses. One is the fact that the environmental dis- 
tribution 2- K( - fJ > ) that we have used is invariant, 
up to a multiplicative constant, to changes in the 
reference machine U. While this affords us some 
protection, it still means that the relative intelli- 
gence of agents can change if we change our refer- 
ence machine. One approach to this problem might 
be to limit the complexity of the reference machine, 
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for example by limiting its state-symbol complex- 
ity. We expect that for highly intelligent machines 
that can deal with a wide range of environments 
of varying complexity, the effect of changing from 
one simple reference machine to another will be mi- 
nor. For agents which are less complex than the 
reference machine however, such a change could be 
significant. 

A theoretical problem is that our distribution 
over environments is not computable. While this 
is fine for a theoretical definition of intelligence, 
it makes the measure impossible to directly im- 
plement. The solution is to use a more tractable 
measure of complexity such as Levin's Kt complex- 
ity |Lev73| . or Schmidhuber's Speed prior [2ch02 . 
Both of these consider the complexity of an al- 
gorithm to be determined by both its description 
length and running time. Intuitively it also makes 
good sense, because we would not usually consider 
a very short algorithm that takes an enormous 
amount of time to compute, to be a particularly 
simple one. 

The only closely related work to ours is the C- 
Test |HO00) . While our intelligence measure is fully 
dynamic and interactive, the C-Test is a purely 
static sequence prediction test similar to standard 
IQ tests for humans. The C-Test always ensures 
that each question has an unambiguous answer in 
the sense that there is always one consistent hypoth- 
esis with significantly lower complexity than the al- 
ternatives. Perhaps this is useful for some kinds of 
tests, but we believe that it is unrealistic and limit- 
ing. Like our intelligence test, the C-Test also has 
to deal with the problem of the incomputability of 
Kolmogorov complexity. By using Levin's Kt com- 
plexity, the C-Test was able to compute a number of 
test problems which were used to test humans. The 
"compression test" Mah99 for machine intelligence 
is similarly restricted to sequence prediction. We 
consider the linguistic complexity tests of Treister- 
Goren et. al. to be far too narrow. The psychome- 
tric approach of Bringsjord and Schimanski is only 
appropriate if the machine has a sufficiently human- 
like intelligence. 

7 Conclusions 

Given the obvious significance of formal definitions 
of intelligence for research, and calls for more di- 
rect measures of machine intelligence to replace the 
problematic Turing test and other imitation based 
tests |Joh92| . very little work has been done in this 
area. In this paper we have attempted to tackle 
this problem head on. Although the test has a few 
weaknesses, it also has many unique strengths. In 
particular, we believe that it expresses the essentials 



of machine intelligence in an elegant and powerful 
way. Furthermore, more tractable measures of com- 
plexity should lead to practical tests based on this 
theoretical model. 
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